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Inference about the tail of a Distribution. 
Improvement on the Hill Estimator 

Jean Nuyts [] 



Abstract 



The Hill estimator is often used to infer the power behavior in tails of 
experimental distribution functions. This estimator is known to pro- 
r~~ ' duce bad results in certain situations which have lead to the so-called 

H 

C/3 Hill horror plots. In this brief note, we propose an improved estimator 

f^ which is simple and coherent and often provides an efficient remedy 

-h in the bad situations, especially when the distribution is decreasing 

i— i slowly, when the data is restricted by external cuts to lie within a 

t— i finite domain, or even when the distribution is increasing. 
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1 Introduction 

It has been advocated that self-organization, first discovered to dominate 
sand pile formation [1J, may very well apply to many financial, economic, 
traffic control or social phenomena. The general outcome of these systems is 
in general an asymptotic power like behavior of the experimental distribution 
x~^ in some variable x. The value of the critical exponent jjl is often a major 
prediction of the models. Let us also mention models leading to Pareto 
[2], Zipf [3], Levy-flight [I] or Pade [5] type distributions which also provide 
asymptotic power like behaviors with some definite critical exponent. 

A particularly simple estimator of the critical exponent for a set of data 
measures [6] has been provided by Hill \7\. It has proved to be very useful and 
has been widely used. See for example the KU Leuven web page [8] . Follow- 
ing Hill, the estimator is first applied to a subset of measures corresponding 
to the highest values of the variable. In order to increase the statistics, this 
subset is then extended to include progressively further experimental values 
of the variable, every set providing a value of fi. These values are then plot- 
ted in a so-called Hill plot [9]. The subset is extended as long as it includes 
values which can be considered to lie in the asymptotic domain where the 
distribution is power like. The values obtained when they are stable provide 
the best value of the Hill estimator, as usually seen on the Hill plot. It is 
then obvious that the estimator usually takes into account all the highest 
experimental values of the variable larger than a certain value x > L. The 
data points with x < L do not belong to the asymptotic region or turn out 
to be unsafe to use. 

Various properties, aspects and generalizations of the Hill estimator have 
been discussed and studied in numerous publications. One should mention 
that many questions about the asymptotic properties, about asymptotic nor- 
mality, and about the volatility of the index have been addressed (see for 
example [10], [TTJ, [12], [13], [H]). Some authors have proposed, in order to 
improve the high volatility of the Hill plots, to smooth the result by averaging 
the Hill estimator values corresponding to different numbers of order statis- 
tics [15J. Other authors have smoothed the Hill estimator by convoluting the 
experimental random variables by a kernel function together with a band- 
width parameter [16]. An optimal choice of the kernel and of the parameter 
improves the situation greatly. 

However, it should be stressed that, in many cases, experimental distribu- 
tions are related to discrete phenomena, which carry their own natural limits. 



In other cases, large values in the data may be biased, unsafe or unreachable 
for technical reasons. Let us give two examples. 

One example of considerable importance relates to the evaluation of risk 
in finance, in particular those related to the variations of interest rates. Start- 
ing from their known daily variation in the past, one may try to evaluate the 
probability of a major overnight variation of say 3 or 4 percent in the fu- 
ture. Using a normal law, as often used by risk managers, the catastrophe 
should occur of the order of once every 10,000 years. Starting from the same 
available data but with a power law this may happen tree or four times per 
century. The precise evaluation of the parameter /i is obviously of paramount 
importance. In data of the Federal Reserve System, if any day the variation 
is larger than a certain predefined limit the quotation is suspended. Hence 
the data is artificially cut for high x. The relevant Xi must be restricted in 
a finite domain extending from some value L where asymptotics begins to 
some value R where the data is artificially cut. 

Another example occurs in sociology. Suppose that some phenomena de- 
pends on the number of inhabitants of cities, it is clear that no data on earth 
today will be obtained for towns larger than say thirty millions inhabitants 
or smaller than say ten inhabitants. The available data will be restricted to 
a finite domain. 

To conclude, in various cases, the variable is not only discrete but also 
restricted to lie within a finite domain Di < x < Dr. Data outside the 
domain is either not available or unsafe to use or does not correspond to 
the asymptotic region. Remark that the domain may not only have a lowest 
(left) bound D L but also a highest (right) bound D R . Though the lower 
bound Dl is taken into account by the Hill estimator, the possible presence 
of the highest bound Dr has not been addressed by the Hill. 

In this paper, we thus wish to provide a very simple improvement of the 
Hill estimator (and of the Hill plot) which takes into account in a perfectly 
symmetrical way of a lowest value L (Dl < L) and of a highest value R 
(R < Dji) defining a safe domain [L, R] where the power behavior is at work, 
where data exists, and from which the critical exponent should be inferred. 

The problem of existing limits on the data has been subject to a much 
lower scrutiny than the Hill estimator itself. We would like to cite the work 
of Beirlant and Guillou and coworkers [17], [18] dealing in particular with 
insurance policies where policy provisions (deduction, limits) constrain the 
data. They suppose censored data i.e. that the number of events above the 
constrain value is known and discuss the influence of censoring. This is not 



exactly the aim of our article. 

We have focused our attention on data which exist in a finite range only 
and have obtained the simplest estimator which takes into account in a sym- 
metrical way the lower and higher bounds of the domain. 

Finally, we would like to mention the related problems of variance and 
bias of the estimators and especially the question of their experimental de- 
termination. For the Hill estimator, this is delicate and it has given rise 
to much research as is testified by numerous references [19], [20], [21], [22J. 
Starting from the basic definition of the variance-bias parameters applied to 
the improved estimator, a detailed discussion following the paths set up for 
the Hill estimator are certainly worth further work and publications, both in 
theory and when actual data, which carry their own uncertainty, are used. 
Indeed, the experimental precision of each data point is not always easy to 
determine, may in fact depend of the value of the Xi itself and has to be 
taken into account precisely. On the other hand, in simulations, the result 
depends on the details of the random number generators. 

2 The Hill Estimator 

One supposes that, based on some theoretical model, one predicts a distri- 
bution density f(x) which behaves asymptotically as 

f(x) ~ g(x) x~^ when x — > oo (1) 

where / and x are real and the critical exponent \x is a positive real param- 
eter. When x — > oo, the function g(x) is supposed to be a rather smooth 
function which, for large x (say x > L), is often assumed to become essen- 
tially a constant A. In those cases and within a certain margin of error, the 
distribution is thus approximated in the form 

/(a;)~A:r~ M when x — > oo (2) 

which is scale free. As explained in the introduction, it has been conjectured 
by different authors that this type of power like distributions and in particular 
those based on self-organized critical models, rather than the often used 
Gauss like exponential forms, could very well dominate certain financial, 
economic or social phenomena. 

Let Yi (Yi > 0) be a random sample obtained from experimental data for 
a phenomenon which is supposed to follow a distribution law / satisfying 



the requirements (TTT) and/or ff2§. The question is to draw inference on the 
critical exponent ft from the random sample. 

This most important question was discussed very carefully by Hill [7j 
both from a Bayesian and a frequentist approach. He showed that, in a first 
approach, both points of view lead to the same very useful answer. His recipe 
for estimating /i can be outlined as follows. 

• Let the set Xj be the set Yj reordered (reversed order statistics) in such 
a way that 

X t > Xj for i < j (3) 

i.e. the set X, is ordered in a decreasing fashion, X\ being the highest 
Y value. 

• Construct the sets of numbers H^, a.k and \x^ for k > 2. The set Hk is 

defined by 

1 k 

if* = v X>(*» - ln(X*) (4) 

which is identical to the set proposed by Hill (see [7j) 
1 A, ,„, r 



H r . 



- J2 H*i) - —T ln (^+i) (with H r = H r+1 ) . (5) 

1 ~* T + 1 



r + lpl ' r- I 



The set otu is defined by 

a k = 1/H k (6) 

and the set //& defined by 

Hk = oik + 1 • (7) 

For quite many distributions, Hill showed that \X). is an estimator of \x 
improving when k is increased until "it seems unwise to proceed". It 
is the point Xk ~ L at the left of which the form of the actual dis- 
tribution is not anymore approximated safely enough by the equation 
pi). In other words, this occurs when the variable x is not large enough 
anymore to be in the asymptotic region of the distribution. Both the 
phenomenon and the limitation can easily be seen by constructing spe- 
cific examples. 



Unfortunately, there are simple but somewhat more subtle distributions 
for which the set /i^ does not converge well when k is increased. This type 
of situation has led to what has been called "horror Hill plots" . Such horror 
plots can easily be constructed by considering simple examples or by looking 
at page 194 of the reference [23] • As argued in the introduction, there are also 
cases when the discrete experimental distribution carry their own domain, 
bounded on the left and on the right Dl < x < Dr. 

In this short note, we show how to remedy some of these unfortunate situ- 
ations in a very straightforward way. We construct some numerical examples 
to show this explicitly. This result will be achieved by taking into account, 
not only a left boundary L but also a right boundary R. It is supposed that 
the data cannot be trusted and/or is not power like at the left of L or at the 
right of R. 

3 Improvement on the Hill estimator 

We first derive a simple heuristic formula and then show how to apply it to 
improve the Hill estimator. 

3.1 A simple and exact formula 

Let us first suppose that f(x) is exactly, for x > 0, of a power law form 

f(x) = \x~» (8) 

with an arbitrary normalization constant A. Take two arbitrary positive 
numbers < L < R and define the average value < ln(x) >lr of ln(x) on 
the interval [L, R] as 

Jl f{x)dx 



After some algebra, defining 
and for later convenience 



fi = a + l (10) 



H=l, (11) 



one finds, from d8|), the exact relation 



, . . 1 In (L )L~ a - In (R )R~ a 

< In x > LR = - + — ^- -^ (12) 

v ' LR a L~ a - R- a 

which depends on the correction function C(a, L, R) 

C {a ,L, R) = "^^Mg^ . (13) 

Equation ( |i~2| is our basic equation which is exactly valid for an exact power 
law distribution (J8| and approximately correct for a distribution which sat- 
isfies (jlj and/or (J2J). 

3.2 The basic Hill Estimator 

The basic Hill estimator is obtained when the upper limit R is chosen at 



infinity Indeed, when \i is larger than 1 and R — > <x>, Equation (12) reduces 
to 

< ln(x) > Loo = H + ln(L) . (14) 



The formula (14) can easily be used to draw inference for a and fi from 
the k highest values of the random sample X with the chosen order statistics 
which we consider as the conditional event. Take < ln(x) >lr as the exper- 
imental average value of ln(x) from L = Xk to the highest empirical value 

1 k 
<ln(x)> Loo = vT,HXj) (15) 

1 k 

tJ^HXj) = H k + \n(X k ). (16) 

i=i 

We find the formula for the Hill H k estimator Q exactly 

Remark that by using a Simpson rule an experimental average, slightly 



better than (15), is obtained by 

< ln(x) > Loo = hx 1 + X k ) + l y £ HXj). (17) 

1 fc j=2 



The difference for the Hill estimator using the slightly better (17) rather than 



(15) is usually too minute to care. 



3.3 The Improvement 



From the equation Q12J) , we see that the Hill estimator can be improved by 
taking into account not only a lowest value L = Xi in the reduced empirical 
set {Xi,X L _i, . . . ,X r } but also a highest value R = X r . Taking again a 
sample estimate for < ln(x) >lr in the reduced set, one obtains the solution 
aw of 

T^TTI ^ ln ™ " ^ + x r - - xr- (18) 

as an inference for \x 

fJt lr = 1 + a Lr ■ (19) 

This obviously provides an easy generalization of the Hill procedure. It 
makes sense when it is known theoretically that the density function f(x) 
behaves as x~^ times a constant A or times a slowly varying function g[x) 
and that the random sample is secure for Dl < L < x < R < Dr. It takes 
into account the facts that the exact theoretical form of f(x) is not known 
on the left of x = L and that the data points do not extend beyond R = X r 
because of limited statistics or when the data is poorly known outside a finite 
Dl, Dr domain. In this case, the set of Xj derived from the random sample 
Y should include the values inside the interval only and thus take the Xj's 
between the two limit points (L < X; < Xj < X r < R) into account. 

The generalised Hill plot is obtained by taking first / = r + 1, then 
increasing I until "t seems unwise to proceed" and plotting the /i/ r as a 
function of I, Otherwise I can be increased and r decreased until "it seems 
unwise to proceed" . When the data is not biased for the largest values of Xj, 
the choice r = 1 is optimal. 

It is finally worth noting the very important fact that our basic equation 



(18) is perfectly symmetrical under the exchange X; <-> X r . As a result, it 
will be apply and provide meaningful answer for distribution which increase 
(fi negative) rather than decrease with x, as will be seen in the examples. 



Our fundamental equation (18) is transcendental and hence requires a 



further treatment. The estimation of the value of ay inferred from (18) can, 
for example, be obtained numerically in the two following ways 

Method (1): Direct Evaluation. By using standard ad hoc computer pro- 



grams, the equation ( 18 ) can be solved numerically for a^ 



Method (2): Iteration. A simple way to achieve the same result is as fol- 
lows. Define the function D(a, L, R) as the derivative of the correction 
function C(a, L, R) with respect to a 



D(a,L,R) = ~C(a,L,R) 

R a L a (\n(L) - \n(R)f 
(L a - R a ) 2 

Take the first order a lr as the Hill solution 



(20) 



«E ] = ^ l(x) (21) 

and define the successive approximations af r , p = 2, ... by iteration as 



[p+i] [p] 
at = aft 



I m EU^) >Wn> ] X x^ i\ 
a lr — ^rp^ a lr <^{a lr ,JL h X r " 

H 2 n^>l 



^ aj« D« J ,^,X r 



(22) 



/ 



In the right hand side, the correction function C and its derivative 
D are estimated for the value of a of the preceding iteration. Then 
af/ — > ai r for p — > oo. Empirically, for [p] as low as [4] or [5], the 
approximation for a\ r and hence for /ij r = a^+1 is already very good. 

4 Theoretical comparison between the Hill 
estimator and the new estimator 

When should the new estimator obviously be preferred to the Hill estimator? 
We remark that the latter (see Eq.Q) depends critically on two quantities 
only. These are the average value of the logarithm of the data points and 
the smallest Xi (highest /) included in the sample. The improved estimator 
depends on these two quantities but also crucially on the third one: the 
highest attained value X r . 

Let us justify this by discussing more carefully than in the introduction 
three examples of situations, pertaining to three completely different domains 
of research (sociology, economy, high energy physics), where both limits Xi 
and X r should be taken into account. 
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Suppose that some sociological theory predicts an asymptotic power 
law for some phenomenon (crime rate, economic growth, power con- 
sumption) as a function of the number of inhabitants N- m h in towns, 
either when this number is high or when it is small. Obviously, since 
every country has a finite number of inhabitants, the number of people 
in any city on earth is less than some number, actually 11.914 x 10 6 
inhabitants for Bombay (India). There can be no data for larger num- 
bers. The data is cut artificially on this higher side essentially by the 
fact that the earth has a finite population. On the other end of the 
statistics, when iV in h is small, no aggregation of human beings is called 
a city if it contains less than say about ten inhabitants, depending on 
the rules for defining towns in different countries. Even more, no town 
will ever be defined with a fraction of one inhabitant. Hence the data 
is cut on the lower side by country depending administrative decisions. 

Suppose that some economic theory predicts a power law for the daily 
variation of the interest rates (or of the price of some share) when they 
are large (the tail of the variations). Usually, if this variation on some 
day exceeds some predefined number the floor has rules to suspend the 
quotation. Hence no data will ever be produced with higher variations. 
This artificial limit, which is usually not taken into account in the 
models, is included in the analysis of the data by the dependence of 
the new estimator in the highest value X r . 

In high energy physics, when two beams of initial particles are made 
to scatter head on at very high speed along a certain direction, final 
particles emerge from the scattering region at an angle with respect to 
the direction of the initial beams. The probability distribution is often 
plotted as a function of the transverse momentum which is related to 
this angle. The interaction region is surrounded by detectors. In the 
forward and in the backward directions there is a blind cone where the 
scattered particles cannot be identified among those of the intensive 
initial beams. Hence there are artificial limits both at low and at high 
transverse momenta outside which no data is produced because of the 
physics of the measuring devices. When the tails of the distributions of 
the scattered particles with respect to beam are studied, these artificial 
limitations have to be taken into account. 

In practical situations, to obtain the improved estimator the parame- 



ters bounds X[ and X r , which limit on the left and on the right the 
data points, have to be chosen in an appropriate manner. There is, a 
priori, no unique method for choosing them. For actual phenomena, 
related to some definite model (for example to solutions of a differ- 
ential equation), the model itself should provide some information on 
the bounds. The onset of the asymptotic domain of the model pro- 
vides an indication on X\. On the other hand, the measure themselves 
may carry a natural right bound X r where the measurements become 
meaningless (for example a town of more than 5 x 10 7 inhabitants). The 
estimator should be evaluated for various values of the bounds X h X r 
and educated guesses have to be applied. In general, it is convenient to 
simply use the highest data point as X r unless there are good reasons 
to reject it. In high energy physics, "good experimentalists" are known 
to apply suitable and correct cuts to their data, depending on a deep 
inside knowledge of their apparatus. 

5 Tests of the new Estimator 

In order to test the new estimator, we have used it for some trial functions. 
The new estimator performs as well or even slightly better than the original 
Hill estimator in cases where the latter is known to be good. When applied 
to cases where the Hill estimator is known to perform more poorly, the im- 
provement is often spectacular and the new estimator converges to a much 



better approximate value. Using the method (2) above (see Eq.(22)), a good 



value ai r is usually obtained after three steps only and always after four steps 
i.e. from a\ r ~ cqj. 

Let us give here a few examples where we have used the following proce- 
dure. The examples are divided in sets. The examples (1)-(13) are related 
to very simple distributions where the best value of \i is obtained by letting I 
become the largest available value in the data I = X xaxi & and r the lowest i.e. 
r = 1. In the examples (14)- (17), the Hill plot is drawn for somewhat more 
complicated distributions where the deviation from a purely power behavior 



shows up for small x. In all the cases, r = 1 (see (18)) has been chosen 



1. We have used FORTRAN to do numerical calculations 

2. We have used the DRNGCS module of the International Mathematical 
and Statistical Libraries (IMSL) to generate randomly a certain num- 

10 



ber iV rand of X iy inside a predefined interval [D L , D R ] and following a 
given distribution P. It should be noted that the normalization of P is 
inessential. As required by DRNGCS, this normalization is chosen in 
such a way that the cumulated distribution N(y) 

My) = [ y P(x)dx (23) 

Jd l 

has N(Dr) = 1. In our examples, the distribution P and the cumulated 
distribution N have always been defined on a grid of 1000 to 10,000 
points regularly separated between Dl and Dr. This is precise enough 
for our purpose. 

3. For each example (1)-(13), we give the results in a table which includes 

• The trial distribution P, up to an arbitrary factor, fed to the 
random generator. 

• The predefined limits Dl and Dr chosen for the data Dl < Xi < 
Dr produced by the random generator and ordered according to 
Eq.Q. 

• The number iV rand of randomly produced X^ 

• The smallest value L = Xm and highest value R = X\ obtained 

J ■ run *— ' - 1 - 

from the random generator. Obviously Dl < L < R < Dr. 

• The average value a of the natural logarithm of the Xi as produced 
by the random generator. 

1 -''ra.nr! 



a 



-''rand j = i 



E MX,) . (24) 



The value expected for /i, the exponent of the power decrease of 
the starting distribution P. 

The value produced by the Hill estimator /i H m = /i^ = aW+1 



(see Eq.Q). 

The values fj} 5 ^ produced by the 4 d iteration. Quite generally, the 
values /i' 4 l produced by the 3 d iteration are already very good 
except in example (13) below where we had to proceed to /x' 5 '). 
In almost all cases (except in example (13)), it turns out that 
fj}^ ~ fjft for j > 4 with at least four significant digits well within 
the errors of the method. 
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Example (1) 

Let us first give an example where the old and the new estimators perform 
both well. We have supposed a power law (Tsj) with [i = 5, Dl = 3 and 
Dr = 150. In this case, Dr is large enough and R can be essentially ignored 
in all the formulas. Following the procedure outlined above we obtain the 
line (1) of the table. We see that in the values produced by the Hill estimator 
and by the new estimator are almost equal and close to the expected value 
fj, = 5. The new estimator is slightly better. The same behavior is found 
when fj, and the related Dr are large enough. 

Example (2) 

The second example is almost identical to the first example except that the 
data is artificially reduced to the interval Dl = 3, Dr = 4. We find the line 
(2) of the table. Here we see that in this extreme case, where the interval 
allowed has been reduced drastically, the Hill estimator has produced a very 
bad result //hui ~ 10 while the new estimator converges toward a much better 
value close to the expected one. 

Example (3) 

The third example in line (3) of the table is identical to the second except 
that the number of X iy N Tan ^ is increased to 5000. The new estimator behaves 
even better while the Hill one is still very bad. This behavior was expected. 
It should be noted that, when Dr is increased from 4 to over 100, the two 
estimators tend gradually to produce compatible results. There is a weak 
dependency in the value chosen for iV rand provided that it is chosen large 
enough. 

Examples (4), (5), (6) 

When the probability distribution is chosen to decrease very slowly, here 
P = X/y/x, even if D R is chosen rather large D R = 150, D R = 1500 or 
Dr = 15000, we see in line (4), (5) and (6) of the table that the new estimator 
is definitively better than the Hill one. An increase of iV ran d and/or of Dr 
does not alter this fact drastically. 

Examples (7), (8) 

Let us now give an example where the initial distribution involves a slowly 
varying function as the consequence of a Pade(l,4) type distribution in the 
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positive x region x > 



P = A — - . (25) 



We expect a \x = 4 type behavior provided that x is chosen sufficiently large. 
The actual left limit L to be used depends on the values of the parameters p,. 
We analyze the problem in the case of an experimental symmetric distribution 
of variation of interest rates of a Pade type (see [5] and especially the results 
obtained in the first reference). Data which were extracted from the Federal 
Reserve System lead to p 2 = 494.7 and p 4 = 4886.0 for a lag of one day 
and a maturity of one year. Here, x is expressed in the convenient unit of 
percent/year. The parameters Dr and Dr as well as L and R are given 
percent/year. It is easy to see that for the term in p 4 to dominate over the 
term p 2 in the denominator, a value of x of about one percent in needed. 
Hence L has to be chosen greater than one percent. We do not expect a 
x~ 4 behavior for smaller values. In lines (7) and (8) of the table, the left 
boundary Dl is taken to be one percent while the right one Dr, when the 
floor would be supposed to suspend the quotation, is respectively two and 
five percent. 

Examples (9), (10), (11), (12) 

Let us now turn to two examples when there is a logarithmic factor, namely 

/-a"^ (26) 

X 

and when there is an inverse logarithmic factor 

la(x)x 

In both cases the distribution decreases rather slowly so that taking into 
account the right boundary has usually a major effect. In these cases , we 
have increased the value of iV rand to 5000. The results depend rather weakly 
on this choice. In lines (11) and (12) of the table, the presence of the ln(x) 
factor in the denominator leads to an effective decrease of the probability 
faster than 1/x. Hence, the effective /i^ is expected and turns out to be 
somewhat higher than one. In the opposite way, in lines (9) and (10), when 
the logarithm appears in the numerator, the effective /j} 5 ^ is expected to be 
somewhat lower than one and it is. 
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Example (13) 

Since the treatment of the new estimator ( 18 ) is perfectly symmetrical under 
the exchange X\ -h- X r of two limits X r and Xi, it can be applied directly 
without any change to infer the asymptotic power behavior of distributions 
which increase with x and hence have a power behavior with negative bi up 
to a smooth multiplication function. Let us give a last example for /i = —3.5, 
Dl = 3.0 and Dr = 10000.0. Obviously the Hill estimator produces a wrong 
sign for /imii- The new estimator is perfect but the iterations had to be 
carried effectively to the fourth level since the starting /iM = /i Hill is so bad. 

Example (14). Figure 1 

In figure 1, we show the original Hill plot and the improved plot related to 



the Pade example (8). The parameters in Eq. 25 are again p2 = 494.7 and 
P4 = 4886.0. Here we have chosen Dl = 1 percent and Dr = 3 percent 
and generated 2000 data points. The continuous line is the original Hill 
plot, the dotted line is the improved plot and the dashed line is the expected 
asymptotic value /x = 4. 

Example (15). Figure 2 

We have taken a distribution 

f{x) =3.0aT 4 -° + 1.0ar 2 - 5 . (28) 

The random data with iV ranc i = 10000 has been collected in the interval 
Dl = 10 and Dr = 30. The original Hill estimate and the improved estimate 
are plotted in terms of I ranging up to 10000 in figure 2. The continuous line 
is the original Hill plot, the dotted line is the improved plot and the dashed 
line is the expected asymptotic value /j = 2.5. 

Example (16). Figure 3 

In figure 3, the distribution is taken as ln(x)/x) related to example (10). The 
random data with iV ran d = 10000 has been collected in the interval Dl = 100 
and Dr = 400. The continuous line is the original Hill plot, the dotted line 
is the improved plot and the dashed line is the expected asymptotic value 
/i = 1 (regardless of the ln(x)). In fact one sees that when k reaches about 
4000, the value of hi is close to 1. For larger values of k, \i decreases below 
fj, = 1. This is due to the fact that the slowly varying function \n(x) in the 
numerator leads to a slightly decreased value of \i whose effects increases 
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with k. Indeed, since \n(x) is an increasing function of x, the distribution is 
expected to decrease somewhat more slowly than 1/x. 

Example (17). Figure 4 

In figure 4, the distribution is taken as \j\fx related to examples (4), (5), 
(6). The random data with N Tan ^ = 10000 has been collected in the interval 
Dl = 3 and Dr = 1500. The continuous line is the original Hill plot, 
the dotted line is the improved plot and the dashed line is the expected 
asymptotic value \i = 0.5. 

Figure 1: A Pade [0,4] behavior (see Example 14) 
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Figure 2: A sum of two power like behaviors (see Example 15) 



14- 


1 1 




12- 


1 1 




10- 
6 - 


& \ 




6 - 


i v. 




4 - 


^, ^ -^^_ 










2 - 






n - 


i i i i 





2000 



4000 



6000 



6000 



10000 



Hill Estimator 
Improved Estimator 
expected valu&-1 



16 



Figure 3: A ln(x)/x behavior (see Example 16) 
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Figure 4: A \j\fx behavior (see Example 17) 
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6 Conclusion 

We have shown that, by a very simple alteration, the Hill estimator, which 
provides a useful inference for the power behavior of the tail of a distribution, 
can be easily improved to cover cases where it performs badly. This includes 
cases when the tail is not of the form of a power law but is multiplied by 
a slowly varying functions g(x) including logarithms for example. It also 
applies when the power fj, is of the order or smaller than one. It even works 
for inferring the tails of increasing distributions i.e. when \x is negative. 

Needless to say, the approach outlined in this paper is one of efficiency. 
From the frequentist point of view, it is perfectly justified. However, in some 
way, the Bayesian point of view is also, but granted not completely, met as 
the theoretical and empirical knowledge of extreme tails of the distribution 
is somehow better taken into account. It should be stressed that the new 
estimator applies particularly when the data is confined by artificial external 
conditions to lie within a finite domain bounded not only by a lowest value 
of the variable but also by a highest value. 
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